<!DOCTYPE html>

<html>

<head>

<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />

<meta name="viewport" content="width=device-width, initial-scale=1" />

<meta name="author" content="Michael Friendly" />

<meta name="date" content="2023-10-24" />

<title>Guerry data: Multivariate Analysis</title>

<script>// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
// be compatible with the behavior of Pandoc < 2.8).
document.addEventListener('DOMContentLoaded', function(e) {
  var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
  var i, h, a;
  for (i = 0; i < hs.length; i++) {
    h = hs[i];
    if (!/^h[1-6]$/i.test(h.tagName)) continue;  // it should be a header h1-h6
    a = h.attributes;
    while (a.length > 0) h.removeAttribute(a[0].name);
  }
});
</script>

<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>



<style type="text/css">
code {
white-space: pre;
}
.sourceCode {
overflow: visible;
}
</style>
<style type="text/css" data-origin="pandoc">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } 
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } 
code span.at { color: #7d9029; } 
code span.bn { color: #40a070; } 
code span.bu { color: #008000; } 
code span.cf { color: #007020; font-weight: bold; } 
code span.ch { color: #4070a0; } 
code span.cn { color: #880000; } 
code span.co { color: #60a0b0; font-style: italic; } 
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } 
code span.do { color: #ba2121; font-style: italic; } 
code span.dt { color: #902000; } 
code span.dv { color: #40a070; } 
code span.er { color: #ff0000; font-weight: bold; } 
code span.ex { } 
code span.fl { color: #40a070; } 
code span.fu { color: #06287e; } 
code span.im { color: #008000; font-weight: bold; } 
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } 
code span.kw { color: #007020; font-weight: bold; } 
code span.op { color: #666666; } 
code span.ot { color: #007020; } 
code span.pp { color: #bc7a00; } 
code span.sc { color: #4070a0; } 
code span.ss { color: #bb6688; } 
code span.st { color: #4070a0; } 
code span.va { color: #19177c; } 
code span.vs { color: #4070a0; } 
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } 
</style>
<script>
// apply pandoc div.sourceCode style to pre.sourceCode instead
(function() {
  var sheets = document.styleSheets;
  for (var i = 0; i < sheets.length; i++) {
    if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
    try { var rules = sheets[i].cssRules; } catch (e) { continue; }
    var j = 0;
    while (j < rules.length) {
      var rule = rules[j];
      // check if there is a div.sourceCode rule
      if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") {
        j++;
        continue;
      }
      var style = rule.style.cssText;
      // check if color or background-color is set
      if (rule.style.color === '' && rule.style.backgroundColor === '') {
        j++;
        continue;
      }
      // replace div.sourceCode by a pre.sourceCode rule
      sheets[i].deleteRule(j);
      sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
    }
  }
})();
</script>



<style type="text/css">

div.csl-bib-body { }
div.csl-entry {
clear: both;
}
.hanging div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}
</style>

<style type="text/css">body {
background-color: #fff;
margin: 1em auto;
max-width: 700px;
overflow: visible;
padding-left: 2em;
padding-right: 2em;
font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
font-size: 14px;
line-height: 1.35;
}
#TOC {
clear: both;
margin: 0 0 10px 10px;
padding: 4px;
width: 400px;
border: 1px solid #CCCCCC;
border-radius: 5px;
background-color: #f6f6f6;
font-size: 13px;
line-height: 1.3;
}
#TOC .toctitle {
font-weight: bold;
font-size: 15px;
margin-left: 5px;
}
#TOC ul {
padding-left: 40px;
margin-left: -1.5em;
margin-top: 5px;
margin-bottom: 5px;
}
#TOC ul ul {
margin-left: -2em;
}
#TOC li {
line-height: 16px;
}
table {
margin: 1em auto;
border-width: 1px;
border-color: #DDDDDD;
border-style: outset;
border-collapse: collapse;
}
table th {
border-width: 2px;
padding: 5px;
border-style: inset;
}
table td {
border-width: 1px;
border-style: inset;
line-height: 18px;
padding: 5px 5px;
}
table, table th, table td {
border-left-style: none;
border-right-style: none;
}
table thead, table tr.even {
background-color: #f7f7f7;
}
p {
margin: 0.5em 0;
}
blockquote {
background-color: #f6f6f6;
padding: 0.25em 0.75em;
}
hr {
border-style: solid;
border: none;
border-top: 1px solid #777;
margin: 28px 0;
}
dl {
margin-left: 0;
}
dl dd {
margin-bottom: 13px;
margin-left: 13px;
}
dl dt {
font-weight: bold;
}
ul {
margin-top: 0;
}
ul li {
list-style: circle outside;
}
ul ul {
margin-bottom: 0;
}
pre, code {
background-color: #f7f7f7;
border-radius: 3px;
color: #333;
white-space: pre-wrap; 
}
pre {
border-radius: 3px;
margin: 5px 0px 10px 0px;
padding: 10px;
}
pre:not([class]) {
background-color: #f7f7f7;
}
code {
font-family: Consolas, Monaco, 'Courier New', monospace;
font-size: 85%;
}
p > code, li > code {
padding: 2px 0px;
}
div.figure {
text-align: center;
}
img {
background-color: #FFFFFF;
padding: 2px;
border: 1px solid #DDDDDD;
border-radius: 3px;
border: 1px solid #CCCCCC;
margin: 0 5px;
}
h1 {
margin-top: 0;
font-size: 35px;
line-height: 40px;
}
h2 {
border-bottom: 4px solid #f7f7f7;
padding-top: 10px;
padding-bottom: 2px;
font-size: 145%;
}
h3 {
border-bottom: 2px solid #f7f7f7;
padding-top: 10px;
font-size: 120%;
}
h4 {
border-bottom: 1px solid #f7f7f7;
margin-left: 8px;
font-size: 105%;
}
h5, h6 {
border-bottom: 1px solid #ccc;
font-size: 105%;
}
a {
color: #0033dd;
text-decoration: none;
}
a:hover {
color: #6666ff; }
a:visited {
color: #800080; }
a:visited:hover {
color: #BB00BB; }
a[href^="http:"] {
text-decoration: underline; }
a[href^="https:"] {
text-decoration: underline; }

code > span.kw { color: #555; font-weight: bold; } 
code > span.dt { color: #902000; } 
code > span.dv { color: #40a070; } 
code > span.bn { color: #d14; } 
code > span.fl { color: #d14; } 
code > span.ch { color: #d14; } 
code > span.st { color: #d14; } 
code > span.co { color: #888888; font-style: italic; } 
code > span.ot { color: #007020; } 
code > span.al { color: #ff0000; font-weight: bold; } 
code > span.fu { color: #900; font-weight: bold; } 
code > span.er { color: #a61717; background-color: #e3d2d2; } 
</style>




</head>

<body>




<h1 class="title toc-ignore">Guerry data: Multivariate Analysis</h1>
<h4 class="author">Michael Friendly</h4>
<h4 class="date">2023-10-24</h4>


<div id="TOC">
<ul>
<li><a href="#load-data-and-packages" id="toc-load-data-and-packages">Load data and packages</a>
<ul>
<li><a href="#guerry-data-set" id="toc-guerry-data-set"><code>Guerry</code> data set</a></li>
<li><a href="#guerrys-questions" id="toc-guerrys-questions">Guerry’s
questions</a></li>
</ul></li>
<li><a href="#multivariate-visualization-methods" id="toc-multivariate-visualization-methods">Multivariate visualization
methods</a></li>
<li><a href="#data-plots" id="toc-data-plots">Data plots</a>
<ul>
<li><a href="#density-plots" id="toc-density-plots">Density
plots</a></li>
<li><a href="#bivariate-relations" id="toc-bivariate-relations">Bivariate relations</a></li>
<li><a href="#reconnaisance-plots" id="toc-reconnaisance-plots">Reconnaisance plots</a></li>
<li><a href="#biplots" id="toc-biplots">Biplots</a></li>
</ul></li>
<li><a href="#models" id="toc-models">Models</a>
<ul>
<li><a href="#predicting-crime-univariate-regression" id="toc-predicting-crime-univariate-regression">Predicting crime:
Univariate regression</a></li>
<li><a href="#predicting-crime-multivariate-regression" id="toc-predicting-crime-multivariate-regression">Predicting crime:
Multivariate regression</a></li>
</ul></li>
<li><a href="#references" id="toc-references">References</a></li>
</ul>
</div>

<p>André-Michel Guerry’s <em>Essai sur la Statistique Morale de la
France</em> <span class="citation">(<a href="#ref-Guerry:1833">Guerry
1833</a>)</span> collected data on crimes, suicide, literacy and other
“moral statistics” for various départements in France. He provided the
first real social data analysis, using graphics and maps to summarize
this multivariate dataset. One of his main goals in this ground-breaking
study was to determine if the prevalence of crime in France could be
explained by other social variables.</p>
<p>In 1833, the scatterplot had not yet been invented; the idea of a
correlation or a regression was still 50 years in the future <span class="citation">(<a href="#ref-Galton:1886">Galton 1886</a>)</span>.
Guerry displayed his data in shaded choropleth maps and semi-graphic
tables and argued how these could be seen as implying systematic, lawful
relations among moral variables.</p>
<p>In this analysis, we ignore the spatial context of the départements
and focus on multivariate analyses of the the data set.</p>
<div id="load-data-and-packages" class="section level1">
<h1>Load data and packages</h1>
<p>We will primarily use the following packages, so load them now.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="fu">library</span>(Guerry)         <span class="co"># Guerry data</span></span>
<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a><span class="fu">library</span>(car)            <span class="co"># better scatterplots</span></span>
<span id="cb1-3"><a href="#cb1-3" tabindex="-1"></a><span class="fu">library</span>(effects)        <span class="co"># Effect Displays for Linear Models</span></span>
<span id="cb1-4"><a href="#cb1-4" tabindex="-1"></a><span class="fu">library</span>(ggplot2)        <span class="co"># Elegant Data Visualisations Using the Grammar of Graphics</span></span>
<span id="cb1-5"><a href="#cb1-5" tabindex="-1"></a><span class="fu">library</span>(ggrepel)        <span class="co"># better handling of text labels</span></span>
<span id="cb1-6"><a href="#cb1-6" tabindex="-1"></a><span class="fu">library</span>(patchwork)      <span class="co"># combine plots</span></span>
<span id="cb1-7"><a href="#cb1-7" tabindex="-1"></a><span class="fu">library</span>(heplots)        <span class="co"># Hypothesis-Error plots</span></span>
<span id="cb1-8"><a href="#cb1-8" tabindex="-1"></a><span class="fu">library</span>(candisc)        <span class="co"># Visualizing Generalized Canonical Discriminant Analysis</span></span>
<span id="cb1-9"><a href="#cb1-9" tabindex="-1"></a><span class="fu">library</span>(dplyr)          <span class="co"># A Grammar of Data Manipulation</span></span>
<span id="cb1-10"><a href="#cb1-10" tabindex="-1"></a><span class="fu">library</span>(tidyr)          <span class="co"># Tidy Messy Data</span></span>
<span id="cb1-11"><a href="#cb1-11" tabindex="-1"></a><span class="fu">data</span>(Guerry)</span></code></pre></div>
<div id="guerry-data-set" class="section level2">
<h2><code>Guerry</code> data set</h2>
<p>Guerry’s (1833) data consisted of six main moral variables shown in
the table below. He wanted all of these to be recorded on aligned scales
so that <strong>larger</strong> numbers consistently reflected
“<strong>morally better</strong>”. Thus, four of the variables are
recorded in the inverse form, as “Population per …”.</p>
<table>
<thead>
<tr class="header">
<th align="left">Name</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>Crime_pers</code></td>
<td align="left">Population per crime against persons</td>
</tr>
<tr class="even">
<td align="left"><code>Crime_prop</code></td>
<td align="left">Population per crime against property</td>
</tr>
<tr class="odd">
<td align="left"><code>Literacy</code></td>
<td align="left">Percent of military conscripts who can read and
write</td>
</tr>
<tr class="even">
<td align="left"><code>Donations</code></td>
<td align="left">Donations to the poor</td>
</tr>
<tr class="odd">
<td align="left"><code>Infants</code></td>
<td align="left">Population per illegitimate birth</td>
</tr>
<tr class="even">
<td align="left"><code>Suicides</code></td>
<td align="left">Population per suicide</td>
</tr>
</tbody>
</table>
<p>The <code>Guerry</code> data set also contains:</p>
<ul>
<li><code>dept</code> and <code>Department</code>, the French ID numbers
and names for the 86 départements of metropolitan France in 1830,
including Corsica.</li>
<li><code>Region</code>: a factor with main levels “N”, “S”, “E”, “W”,
“C”. Corsica is coded as <code>NA</code>.</li>
<li>A collection of 14 other related variables from other sources at the
same time. See <code>?Guerry</code> for their precise definitions.</li>
</ul>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a><span class="fu">names</span>(Guerry)[<span class="sc">-</span>(<span class="dv">1</span><span class="sc">:</span><span class="dv">9</span>)]</span>
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a><span class="co">#&gt;  [1] &quot;MainCity&quot;        &quot;Wealth&quot;          &quot;Commerce&quot;        &quot;Clergy&quot;         </span></span>
<span id="cb2-3"><a href="#cb2-3" tabindex="-1"></a><span class="co">#&gt;  [5] &quot;Crime_parents&quot;   &quot;Infanticide&quot;     &quot;Donation_clergy&quot; &quot;Lottery&quot;        </span></span>
<span id="cb2-4"><a href="#cb2-4" tabindex="-1"></a><span class="co">#&gt;  [9] &quot;Desertion&quot;       &quot;Instruction&quot;     &quot;Prostitutes&quot;     &quot;Distance&quot;       </span></span>
<span id="cb2-5"><a href="#cb2-5" tabindex="-1"></a><span class="co">#&gt; [13] &quot;Area&quot;            &quot;Pop1831&quot;</span></span></code></pre></div>
<p>Among these, as other aspects of criminal behavior, we see crime
against parents, <code>Infanticide</code> and <code>Prostitutes</code>.
<code>Clergy</code> and <code>Donations_clergy</code> are considered to
be measures of moral rectitude, potentially counteracting crime.</p>
</div>
<div id="guerrys-questions" class="section level2">
<h2>Guerry’s questions</h2>
<p>The main questions that concerned Guerry were whether indicators of
crime could be shown to be related to factors which might be considered
to ameliorate crime. Among these, Guerry focused most on
<code>Literacy</code> defined as the number of military conscripts who
could do more than mark an “X” on their enrollment form. A related
variable is <code>Instruction</code>, the rank recorded from Guerry’s
map; as defined, it is inversely related to <code>Literacy</code>.</p>
<dl>
<dt>Other potential explanatory variables are:</dt>
<dd>
<p><code>Donations</code> (a measure of donations to the poor),</p>
</dd>
<dd>
<code>Donation_clergy</code> (a measure of donations to clergy)
</dd>
<dd>
<code>Clergy</code> (the rank of number of Catholic priests in active
service, per population)
</dd>
</dl>
</div>
</div>
<div id="multivariate-visualization-methods" class="section level1">
<h1>Multivariate visualization methods</h1>
<p>Visualization methods for multivariate data take an enormous variety
of forms simply because more than two dimensions of data offer
exponentially increasingly possibilities. It is useful to distinguish
several broad categories:</p>
<ul>
<li><p><strong>data plots</strong> : primarily plot the raw data, often
with annotations to aid interpretation (regression lines and smooths,
data ellipses, marginal distributions)</p></li>
<li><p><strong>model plots</strong> : primarily plot the results of a
fitted model, considering that the fitted model may involve more
variables than can be shown in a static 2D plot. Some examples are:
Added variable plots, effect plots, coefficient plots, …</p></li>
<li><p><strong>diagnostic plots</strong> : indicating potential problems
with the fitted model. These include residual plots, influence plots,
plots for testing homogeneity of variance and so forth.</p></li>
<li><p><strong>dimension reduction plots</strong> : plot representations
of the data into a space of fewer dimensions than the number of
variables in the data set. Simple examples include principal components
analysis (PCA) and the related biplots, and multidimensional scaling
(MDS) methods.</p></li>
</ul>
</div>
<div id="data-plots" class="section level1">
<h1>Data plots</h1>
<p>Data plots portray the data in a space where the coordinate axes are
the observed variables.</p>
<ul>
<li>1D plots include line plots, histograms and density estimates</li>
<li>2D plots are most often scatterplots, but contour plots or
hex-binned plots are also useful when the sample size is large.</li>
<li>For higher dimensions, biplots, showing the data in principal
components space, together with vectors representing the correlations
among variables, are often the most useful.</li>
</ul>
<!-- ```{r child = "../man/partials/_ggradar.Rmd"} -->
<!-- ``` -->
<div id="density-plots" class="section level2">
<h2>Density plots</h2>
<p>It is useful to examine the distributions of the variables and
<strong>density</strong> plots are quite informative. I want to do this
for each of the 6 main variables, so I’ll use this trick of tidy data
analysis with <code>ggplot2</code>:</p>
<ol style="list-style-type: decimal">
<li>Reshape the data from wide to long. This gives
<code>guerry_long</code>, where the different variables are in a column
labeled <code>variable</code> and the values are in
<code>value</code>.</li>
</ol>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a><span class="fu">data</span>(<span class="st">&quot;Guerry&quot;</span>, <span class="at">package=</span><span class="st">&quot;Guerry&quot;</span>)</span>
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a>guerry_long <span class="ot">&lt;-</span> Guerry <span class="sc">|&gt;</span></span>
<span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a>  <span class="fu">filter</span>(<span class="sc">!</span><span class="fu">is.na</span>(Region)) <span class="sc">|&gt;</span></span>
<span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a>  <span class="fu">select</span>(dept<span class="sc">:</span>Suicides) <span class="sc">|&gt;</span></span>
<span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a>  <span class="fu">pivot_longer</span>(<span class="at">cols =</span> Crime_pers<span class="sc">:</span>Suicides,</span>
<span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a>               <span class="at">names_to =</span> <span class="st">&quot;variable&quot;</span>,</span>
<span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a>               <span class="at">values_to =</span> <span class="st">&quot;value&quot;</span>)</span>
<span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a>guerry_long</span>
<span id="cb3-9"><a href="#cb3-9" tabindex="-1"></a><span class="co">#&gt; # A tibble: 510 × 5</span></span>
<span id="cb3-10"><a href="#cb3-10" tabindex="-1"></a><span class="co">#&gt;     dept Region Department variable   value</span></span>
<span id="cb3-11"><a href="#cb3-11" tabindex="-1"></a><span class="co">#&gt;    &lt;int&gt; &lt;fct&gt;  &lt;fct&gt;      &lt;chr&gt;      &lt;int&gt;</span></span>
<span id="cb3-12"><a href="#cb3-12" tabindex="-1"></a><span class="co">#&gt;  1     1 E      Ain        Crime_pers 28870</span></span>
<span id="cb3-13"><a href="#cb3-13" tabindex="-1"></a><span class="co">#&gt;  2     1 E      Ain        Crime_prop 15890</span></span>
<span id="cb3-14"><a href="#cb3-14" tabindex="-1"></a><span class="co">#&gt;  3     1 E      Ain        Literacy      37</span></span>
<span id="cb3-15"><a href="#cb3-15" tabindex="-1"></a><span class="co">#&gt;  4     1 E      Ain        Donations   5098</span></span>
<span id="cb3-16"><a href="#cb3-16" tabindex="-1"></a><span class="co">#&gt;  5     1 E      Ain        Infants    33120</span></span>
<span id="cb3-17"><a href="#cb3-17" tabindex="-1"></a><span class="co">#&gt;  6     1 E      Ain        Suicides   35039</span></span>
<span id="cb3-18"><a href="#cb3-18" tabindex="-1"></a><span class="co">#&gt;  7     2 N      Aisne      Crime_pers 26226</span></span>
<span id="cb3-19"><a href="#cb3-19" tabindex="-1"></a><span class="co">#&gt;  8     2 N      Aisne      Crime_prop  5521</span></span>
<span id="cb3-20"><a href="#cb3-20" tabindex="-1"></a><span class="co">#&gt;  9     2 N      Aisne      Literacy      51</span></span>
<span id="cb3-21"><a href="#cb3-21" tabindex="-1"></a><span class="co">#&gt; 10     2 N      Aisne      Donations   8901</span></span>
<span id="cb3-22"><a href="#cb3-22" tabindex="-1"></a><span class="co">#&gt; # ℹ 500 more rows</span></span></code></pre></div>
<ol start="2" style="list-style-type: decimal">
<li>Plot the density, but make a different subplot by
<code>facet_wrap(~ variable)</code>. These plots all have different
scales for the X and Y (density) values, so it is important to use
<code>scales=&quot;FREE&quot;</code>. Moreover, I’m primarily interested in the
<strong>shape</strong> of these distributions, so I suppress the Y axis
tick marks and labels.</li>
</ol>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a><span class="fu">ggplot</span>(<span class="at">data =</span> guerry_long,</span>
<span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a>       <span class="fu">aes</span>(<span class="at">x=</span>value, <span class="at">fill=</span><span class="cn">TRUE</span>)) <span class="sc">+</span></span>
<span id="cb4-3"><a href="#cb4-3" tabindex="-1"></a>  <span class="fu">geom_density</span>(<span class="at">alpha=</span><span class="fl">0.2</span>) <span class="sc">+</span></span>
<span id="cb4-4"><a href="#cb4-4" tabindex="-1"></a>  <span class="fu">geom_rug</span>() <span class="sc">+</span></span>
<span id="cb4-5"><a href="#cb4-5" tabindex="-1"></a>  <span class="fu">facet_wrap</span>(<span class="sc">~</span>variable, <span class="at">scales=</span><span class="st">&quot;free&quot;</span>) <span class="sc">+</span></span>
<span id="cb4-6"><a href="#cb4-6" tabindex="-1"></a>  <span class="fu">theme_bw</span>(<span class="at">base_size =</span> <span class="dv">14</span>) <span class="sc">+</span></span>
<span id="cb4-7"><a href="#cb4-7" tabindex="-1"></a>  <span class="fu">theme</span>(<span class="at">legend.position =</span> <span class="st">&quot;none&quot;</span>,</span>
<span id="cb4-8"><a href="#cb4-8" tabindex="-1"></a>        <span class="at">axis.ticks.y=</span><span class="fu">element_blank</span>(),</span>
<span id="cb4-9"><a href="#cb4-9" tabindex="-1"></a>        <span class="at">axis.text.y=</span><span class="fu">element_blank</span>())</span></code></pre></div>
<p><img src="" width="100%" /></p>
<p>You can see that all variables are positively skewed,
<code>Donations</code>, <code>Infants</code> and <code>Suicides</code>
particularly so, but not so much as to cause alarm.</p>
<p>It is also of interest to see whether and how these distributions
differ according to <code>Region</code>. This is easy to do, using
<code>aes(... fill=Region)</code></p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a>col.region   <span class="ot">&lt;-</span> <span class="fu">colors</span>()[<span class="fu">c</span>(<span class="dv">149</span>, <span class="dv">254</span>, <span class="dv">468</span>, <span class="dv">552</span>, <span class="dv">26</span>)] <span class="co"># colors for region</span></span>
<span id="cb5-2"><a href="#cb5-2" tabindex="-1"></a><span class="fu">ggplot</span>(<span class="at">data =</span> guerry_long,</span>
<span id="cb5-3"><a href="#cb5-3" tabindex="-1"></a>       <span class="fu">aes</span>(<span class="at">x=</span>value, <span class="at">fill=</span>Region)) <span class="sc">+</span></span>
<span id="cb5-4"><a href="#cb5-4" tabindex="-1"></a>  <span class="fu">geom_density</span>(<span class="at">alpha=</span><span class="fl">0.2</span>) <span class="sc">+</span></span>
<span id="cb5-5"><a href="#cb5-5" tabindex="-1"></a>  <span class="fu">geom_rug</span>() <span class="sc">+</span></span>
<span id="cb5-6"><a href="#cb5-6" tabindex="-1"></a>  <span class="fu">facet_wrap</span>(<span class="sc">~</span>variable, <span class="at">scales=</span><span class="st">&quot;free&quot;</span>) <span class="sc">+</span></span>
<span id="cb5-7"><a href="#cb5-7" tabindex="-1"></a>  <span class="fu">scale_fill_manual</span>(<span class="at">values=</span>col.region) <span class="sc">+</span></span>
<span id="cb5-8"><a href="#cb5-8" tabindex="-1"></a>  <span class="fu">theme_bw</span>(<span class="at">base_size =</span> <span class="dv">14</span>) <span class="sc">+</span></span>
<span id="cb5-9"><a href="#cb5-9" tabindex="-1"></a>  <span class="fu">theme</span>(<span class="at">legend.position =</span> <span class="st">&quot;bottom&quot;</span>,</span>
<span id="cb5-10"><a href="#cb5-10" tabindex="-1"></a>        <span class="at">axis.ticks.y=</span><span class="fu">element_blank</span>(),</span>
<span id="cb5-11"><a href="#cb5-11" tabindex="-1"></a>        <span class="at">axis.text.y=</span><span class="fu">element_blank</span>())</span></code></pre></div>
<p><img src="" width="100%" />
For some variables, like <code>Infants</code> and <code>Suicides</code>
the differences do not seem particularly large. However, both crime
variables and <code>Literacy</code> show marked differences across
region.</p>
</div>
<div id="bivariate-relations" class="section level2">
<h2>Bivariate relations</h2>
<p>Let’s start with plots of crime (<code>Crime_pers</code> and
<code>Crime_prop</code>) in relation to <code>Literacy</code>. A simple
scatterplot is not very informative. All that can be seen is that there
is not much of a relation between personal crime and literacy.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" tabindex="-1"></a><span class="fu">ggplot</span>(<span class="fu">aes</span>(<span class="at">x=</span>Literacy, <span class="at">y=</span>Crime_pers<span class="sc">/</span><span class="dv">1000</span>), <span class="at">data=</span>Guerry) <span class="sc">+</span></span>
<span id="cb6-2"><a href="#cb6-2" tabindex="-1"></a>  <span class="fu">geom_point</span>(<span class="at">size=</span><span class="dv">2</span>) </span></code></pre></div>
<p><img src="" /><!-- --></p>
<p>More useful scatterplots are annotated with additional statistical
summaries to aid interpretation:</p>
<ul>
<li>linear regression line,</li>
<li>smoothed non-parametric (loess) curve, to diagnose potential
non-linear relations,</li>
<li>data ellipses, to highlight the overall trend and variability,</li>
<li>point labels for potentially outlying or influential points.</li>
</ul>
<p>I use <code>ggplot2</code> here. It provides most of these features,
except that to label unusual points, I calculate the Mahalanobis squared
distance of all points from the grand means.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" tabindex="-1"></a>gdf <span class="ot">&lt;-</span> Guerry[, <span class="fu">c</span>(<span class="st">&quot;Literacy&quot;</span>, <span class="st">&quot;Crime_pers&quot;</span>, <span class="st">&quot;Department&quot;</span>)]</span>
<span id="cb7-2"><a href="#cb7-2" tabindex="-1"></a>gdf<span class="sc">$</span>dsq <span class="ot">&lt;-</span> <span class="fu">mahalanobis</span>(gdf[,<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>], <span class="fu">colMeans</span>(gdf[,<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>]), <span class="fu">cov</span>(gdf[,<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>]))</span>
<span id="cb7-3"><a href="#cb7-3" tabindex="-1"></a></span>
<span id="cb7-4"><a href="#cb7-4" tabindex="-1"></a><span class="fu">ggplot</span>(<span class="fu">aes</span>(<span class="at">x=</span>Literacy, <span class="at">y=</span>Crime_pers<span class="sc">/</span><span class="dv">1000</span>, <span class="at">label=</span>Department), <span class="at">data=</span>gdf) <span class="sc">+</span></span>
<span id="cb7-5"><a href="#cb7-5" tabindex="-1"></a>  <span class="fu">geom_point</span>(<span class="at">size=</span><span class="dv">2</span>) <span class="sc">+</span></span>
<span id="cb7-6"><a href="#cb7-6" tabindex="-1"></a>  <span class="fu">stat_ellipse</span>(<span class="at">level=</span><span class="fl">0.68</span>, <span class="at">color=</span><span class="st">&quot;blue&quot;</span>, <span class="at">size=</span><span class="fl">1.2</span>) <span class="sc">+</span>  </span>
<span id="cb7-7"><a href="#cb7-7" tabindex="-1"></a>  <span class="fu">stat_ellipse</span>(<span class="at">level=</span><span class="fl">0.95</span>, <span class="at">color=</span><span class="st">&quot;gray&quot;</span>, <span class="at">size=</span><span class="dv">1</span>, <span class="at">linetype=</span><span class="dv">2</span>) <span class="sc">+</span> </span>
<span id="cb7-8"><a href="#cb7-8" tabindex="-1"></a>  <span class="fu">geom_smooth</span>(<span class="at">method=</span><span class="st">&quot;lm&quot;</span>, <span class="at">formula=</span>y<span class="sc">~</span>x, <span class="at">fill=</span><span class="st">&quot;lightblue&quot;</span>) <span class="sc">+</span></span>
<span id="cb7-9"><a href="#cb7-9" tabindex="-1"></a>  <span class="fu">geom_smooth</span>(<span class="at">method=</span><span class="st">&quot;loess&quot;</span>, <span class="at">formula=</span>y<span class="sc">~</span>x, <span class="at">color=</span><span class="st">&quot;red&quot;</span>, <span class="at">se=</span><span class="cn">FALSE</span>) <span class="sc">+</span></span>
<span id="cb7-10"><a href="#cb7-10" tabindex="-1"></a>  <span class="fu">geom_label_repel</span>(<span class="at">data =</span> gdf[gdf<span class="sc">$</span>dsq <span class="sc">&gt;</span> <span class="fl">4.6</span>,]) <span class="sc">+</span></span>
<span id="cb7-11"><a href="#cb7-11" tabindex="-1"></a>  <span class="fu">theme_bw</span>()</span></code></pre></div>
<p><img src="" /><!-- --></p>
<p>The flat (blue) regression line and the nearly circular data ellipses
show that the correlation is nearly zero; the smoothed (red) curve
indicates that there is no tendency for a nonlinear relation.</p>
<p>Doing the same for crimes against property:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" tabindex="-1"></a>gdf <span class="ot">&lt;-</span> Guerry[, <span class="fu">c</span>(<span class="st">&quot;Literacy&quot;</span>, <span class="st">&quot;Crime_prop&quot;</span>, <span class="st">&quot;Department&quot;</span>)]</span>
<span id="cb8-2"><a href="#cb8-2" tabindex="-1"></a>gdf<span class="sc">$</span>dsq <span class="ot">&lt;-</span> <span class="fu">mahalanobis</span>(gdf[,<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>], <span class="fu">colMeans</span>(gdf[,<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>]), <span class="fu">cov</span>(gdf[,<span class="dv">1</span><span class="sc">:</span><span class="dv">2</span>]))</span>
<span id="cb8-3"><a href="#cb8-3" tabindex="-1"></a></span>
<span id="cb8-4"><a href="#cb8-4" tabindex="-1"></a><span class="fu">ggplot</span>(<span class="fu">aes</span>(<span class="at">x=</span>Literacy, <span class="at">y=</span>Crime_prop<span class="sc">/</span><span class="dv">1000</span>, <span class="at">label=</span>Department), <span class="at">data=</span>gdf) <span class="sc">+</span></span>
<span id="cb8-5"><a href="#cb8-5" tabindex="-1"></a>  <span class="fu">geom_point</span>(<span class="at">size=</span><span class="dv">2</span>) <span class="sc">+</span></span>
<span id="cb8-6"><a href="#cb8-6" tabindex="-1"></a>  <span class="fu">stat_ellipse</span>(<span class="at">level=</span><span class="fl">0.68</span>, <span class="at">color=</span><span class="st">&quot;blue&quot;</span>, <span class="at">size=</span><span class="fl">1.2</span>) <span class="sc">+</span>  </span>
<span id="cb8-7"><a href="#cb8-7" tabindex="-1"></a>  <span class="fu">stat_ellipse</span>(<span class="at">level=</span><span class="fl">0.95</span>, <span class="at">color=</span><span class="st">&quot;gray&quot;</span>, <span class="at">size=</span><span class="dv">1</span>, <span class="at">linetype=</span><span class="dv">2</span>) <span class="sc">+</span> </span>
<span id="cb8-8"><a href="#cb8-8" tabindex="-1"></a>  <span class="fu">geom_smooth</span>(<span class="at">method=</span><span class="st">&quot;lm&quot;</span>, <span class="at">formula=</span>y<span class="sc">~</span>x, <span class="at">fill=</span><span class="st">&quot;lightblue&quot;</span>) <span class="sc">+</span></span>
<span id="cb8-9"><a href="#cb8-9" tabindex="-1"></a>  <span class="fu">geom_smooth</span>(<span class="at">method=</span><span class="st">&quot;loess&quot;</span>, <span class="at">formula=</span>y<span class="sc">~</span>x, <span class="at">color=</span><span class="st">&quot;red&quot;</span>, <span class="at">se=</span><span class="cn">FALSE</span>) <span class="sc">+</span></span>
<span id="cb8-10"><a href="#cb8-10" tabindex="-1"></a>  <span class="fu">geom_label_repel</span>(<span class="at">data =</span> gdf[gdf<span class="sc">$</span>dsq <span class="sc">&gt;</span> <span class="fl">4.6</span>,]) <span class="sc">+</span></span>
<span id="cb8-11"><a href="#cb8-11" tabindex="-1"></a>  <span class="fu">theme_bw</span>()</span></code></pre></div>
<p><img src="" /><!-- --></p>
<p>So, somewhat surprisingly, increased literacy is associated with an
increase in property crime (greater population per crime) as opposed to
the situation with personal crime, which seems unrelated to literacy.
Creuse again stands out as an unusual point, one that is likely to be
influential in regression models.</p>
</div>
<div id="reconnaisance-plots" class="section level2">
<h2>Reconnaisance plots</h2>
<p>Reconnaisance plots attempt to give a bird’s-eye overview of a
multivariate data set. For example, to see the relations among more than
two variables we could turn to a scatterplot matrix or some other
display to show all pairwise bivariate relations.</p>
<p>For these, my preferred package is <code>car</code> <span class="citation">(<a href="#ref-R-car">John Fox, Weisberg, and Price
2023</a>)</span> with the <code>scatterplotMatrix</code> function.
<code>GGally</code> <span class="citation">(<a href="#ref-R-GGally">Schloerke et al. 2021</a>)</span> works within the
the <code>ggplot2</code> framework, but doesn’t have the flexibility I’d
like.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" tabindex="-1"></a><span class="fu">library</span>(car)          <span class="co"># Companion to Applied Regression</span></span>
<span id="cb9-2"><a href="#cb9-2" tabindex="-1"></a><span class="fu">scatterplotMatrix</span>(Guerry[,<span class="dv">4</span><span class="sc">:</span><span class="dv">9</span>],</span>
<span id="cb9-3"><a href="#cb9-3" tabindex="-1"></a>                  <span class="at">ellipse=</span><span class="fu">list</span>(<span class="at">levels=</span><span class="fl">0.68</span>), </span>
<span id="cb9-4"><a href="#cb9-4" tabindex="-1"></a>                  <span class="at">smooth=</span><span class="cn">FALSE</span>)</span></code></pre></div>
<p><img src="" width="100%" /></p>
<div id="corrgrams" class="section level3">
<h3>Corrgrams</h3>
<p>Sometimes, particularly with more variables than this, we want to see
a more schematic overview.<br />
A <em>correlation diagram</em> or “corrgram” <span class="citation">(<a href="#ref-Friendly:02:corrgram">Friendly 2002</a>)</span> is a graphic
display of a correlation matrix, allowing different renderings of the
correlation between each pair of variables: as a shaded box, a pie
symbol, a schematic data ellipse, and other options. This is implemented
in the <code>corrgram</code> package <span class="citation">(<a href="#ref-R-corrgram">Wright 2021</a>)</span>. The panels in the upper
and lower triangles can be rendered differently.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" tabindex="-1"></a><span class="fu">library</span>(corrgram)             <span class="co"># Plot a Correlogram</span></span>
<span id="cb10-2"><a href="#cb10-2" tabindex="-1"></a><span class="fu">corrgram</span>(Guerry[,<span class="dv">4</span><span class="sc">:</span><span class="dv">9</span>], <span class="at">upper=</span>panel.pie)</span></code></pre></div>
<p><img src="" /><!-- --></p>
<p>Or, the data in each pairwise tile can be rendered with data ellipses
and smoothed curves to show possible nonlinear relations.</p>
<p>Another feature is that the rows/column variables can be permuted to
put similar variables together, using the <code>order</code> option,
which arranges the variables according to similarity of their
correlations.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" tabindex="-1"></a><span class="fu">corrgram</span>(Guerry[,<span class="dv">4</span><span class="sc">:</span><span class="dv">9</span>], </span>
<span id="cb11-2"><a href="#cb11-2" tabindex="-1"></a>         <span class="at">upper=</span>panel.ellipse, </span>
<span id="cb11-3"><a href="#cb11-3" tabindex="-1"></a>         <span class="at">order=</span><span class="cn">TRUE</span>,</span>
<span id="cb11-4"><a href="#cb11-4" tabindex="-1"></a>         <span class="at">lwd=</span><span class="dv">2</span>)</span></code></pre></div>
<p><img src="" /><!-- --></p>
<p>Here, there are a number of pairwise plots that appear markedly
nonlinear. For the main crime variables, the most nonlinear are that of
personal crime vs. donations to the poor, and property crime vs. infants
born out of wedlock and suicides. <code>Literacy</code> stands out here
as having negative relations with all other variables.</p>
<p>An alternative analysis might include:</p>
<ul>
<li>converting the data to ranks.</li>
<li>considering transformations of some of the variables</li>
</ul>
</div>
</div>
<div id="biplots" class="section level2">
<h2>Biplots</h2>
<p>Rather than viewing the data in <strong>data space</strong>, a biplot
shows the data in the <strong>reduced-rank PCA space</strong> that
explains most of the variation of the observations. This is essentially
a plot of the observation scores on the first principal component
overlaid with vectors representing the variables projected into PCA
space.</p>
<p>First, we use <code>prcomp()</code> to carry out the PCA. We’d like
to visualize the result in relation to <code>Region</code>, so delete
Corsica where <code>Region</code> is missing.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" tabindex="-1"></a>gdata <span class="ot">&lt;-</span> Guerry <span class="sc">|&gt;</span></span>
<span id="cb12-2"><a href="#cb12-2" tabindex="-1"></a>  <span class="fu">select</span>(Region, Crime_pers<span class="sc">:</span>Suicides) <span class="sc">|&gt;</span>   <span class="co"># keep only main variables</span></span>
<span id="cb12-3"><a href="#cb12-3" tabindex="-1"></a>  <span class="fu">filter</span>(<span class="sc">!</span><span class="fu">is.na</span>(Region))                   <span class="co"># delete Corsica (Region==NA)</span></span>
<span id="cb12-4"><a href="#cb12-4" tabindex="-1"></a></span>
<span id="cb12-5"><a href="#cb12-5" tabindex="-1"></a>guerry.pca <span class="ot">&lt;-</span> gdata <span class="sc">|&gt;</span></span>
<span id="cb12-6"><a href="#cb12-6" tabindex="-1"></a>  <span class="fu">select</span>(<span class="sc">-</span>Region) <span class="sc">|&gt;</span></span>
<span id="cb12-7"><a href="#cb12-7" tabindex="-1"></a>  <span class="fu">prcomp</span>(<span class="at">scale =</span> <span class="cn">TRUE</span>)</span>
<span id="cb12-8"><a href="#cb12-8" tabindex="-1"></a></span>
<span id="cb12-9"><a href="#cb12-9" tabindex="-1"></a><span class="fu">print</span>(guerry.pca, <span class="at">digits=</span><span class="dv">3</span>)</span>
<span id="cb12-10"><a href="#cb12-10" tabindex="-1"></a><span class="co">#&gt; Standard deviations (1, .., p=6):</span></span>
<span id="cb12-11"><a href="#cb12-11" tabindex="-1"></a><span class="co">#&gt; [1] 1.463 1.096 1.050 0.817 0.741 0.584</span></span>
<span id="cb12-12"><a href="#cb12-12" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb12-13"><a href="#cb12-13" tabindex="-1"></a><span class="co">#&gt; Rotation (n x k) = (6 x 6):</span></span>
<span id="cb12-14"><a href="#cb12-14" tabindex="-1"></a><span class="co">#&gt;                PC1     PC2     PC3      PC4     PC5     PC6</span></span>
<span id="cb12-15"><a href="#cb12-15" tabindex="-1"></a><span class="co">#&gt; Crime_pers -0.0659  0.5906 -0.6732  0.13973 -0.0102 -0.4172</span></span>
<span id="cb12-16"><a href="#cb12-16" tabindex="-1"></a><span class="co">#&gt; Crime_prop -0.5123 -0.0884 -0.4765 -0.09861  0.1381  0.6884</span></span>
<span id="cb12-17"><a href="#cb12-17" tabindex="-1"></a><span class="co">#&gt; Literacy    0.5118 -0.1294 -0.2090  0.00797  0.8213  0.0560</span></span>
<span id="cb12-18"><a href="#cb12-18" tabindex="-1"></a><span class="co">#&gt; Donations  -0.1062  0.6990  0.4134 -0.47298  0.2742  0.1741</span></span>
<span id="cb12-19"><a href="#cb12-19" tabindex="-1"></a><span class="co">#&gt; Infants    -0.4513  0.1033  0.3238  0.73031  0.3776 -0.0696</span></span>
<span id="cb12-20"><a href="#cb12-20" tabindex="-1"></a><span class="co">#&gt; Suicides   -0.5063 -0.3569 -0.0169 -0.46220  0.2976 -0.5602</span></span></code></pre></div>
<!-- A screeplot shows the proportions of variance accounted for by each component. The results show that only 65% -->
<!-- of the variance is accounted for in two dimensions. -->
<!-- ```{r fig.height=4, fig.width=7} -->
<!-- ggs1 <- ggscreeplot(guerry.pca) + theme_bw() + geom_point(size=4) -->
<!-- ggs2 <- ggscreeplot(guerry.pca, type="cev") + theme_bw() + geom_point(size=4) -->
<!-- ggs1 + ggs2 -->
<!-- ``` -->
<p>In the <code>ggplot2</code> framework, biplots can be produced by the
<code>ggbiplot</code> package <span class="citation">(<a href="#ref-R-ggbiplot">Vu and Friendly 2023</a>)</span>, but this
package is not on CRAN, so cannot be directly used in this vignette.
Instead, the code below was run locally and the result included.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" tabindex="-1"></a><span class="cf">if</span>(<span class="sc">!</span><span class="fu">require</span>(ggbiplot)) remotes<span class="sc">::</span><span class="fu">install_github</span>(<span class="st">&quot;vqv/ggbiplot&quot;</span>)</span>
<span id="cb13-2"><a href="#cb13-2" tabindex="-1"></a><span class="fu">library</span>(ggbiplot) <span class="co"># A ggplot2 based biplot</span></span>
<span id="cb13-3"><a href="#cb13-3" tabindex="-1"></a><span class="fu">ggbiplot</span>(guerry.pca, <span class="at">groups=</span>gdata<span class="sc">$</span>Region, </span>
<span id="cb13-4"><a href="#cb13-4" tabindex="-1"></a>         <span class="at">ellipse=</span><span class="cn">TRUE</span>,</span>
<span id="cb13-5"><a href="#cb13-5" tabindex="-1"></a>         <span class="at">var.scale =</span> <span class="dv">3</span>, <span class="at">varname.size =</span> <span class="dv">5</span>) <span class="sc">+</span> </span>
<span id="cb13-6"><a href="#cb13-6" tabindex="-1"></a>  <span class="fu">theme_bw</span>() <span class="sc">+</span> </span>
<span id="cb13-7"><a href="#cb13-7" tabindex="-1"></a>  <span class="fu">labs</span>(<span class="at">color=</span><span class="st">&quot;Region&quot;</span>) <span class="sc">+</span></span>
<span id="cb13-8"><a href="#cb13-8" tabindex="-1"></a>  <span class="fu">theme</span>(<span class="at">legend.position =</span> <span class="fu">c</span>(<span class="fl">0.1</span>, <span class="fl">0.8</span>))</span></code></pre></div>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" tabindex="-1"></a>knitr<span class="sc">::</span><span class="fu">include_graphics</span>(<span class="st">&quot;figures/ggbiplot.png&quot;</span>)</span></code></pre></div>
<p><img src="" width="756" /></p>
<p>This is OK, but there are many features of such plots that cannot be
customized (line widths, colors, … ). I prefer those created using the
<code>heplots</code> package.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" tabindex="-1"></a>op <span class="ot">&lt;-</span> <span class="fu">par</span>(<span class="at">mar=</span><span class="fu">c</span>(<span class="dv">5</span>,<span class="dv">4</span>,<span class="dv">1</span>,<span class="dv">1</span>)<span class="sc">+</span>.<span class="dv">1</span>)</span>
<span id="cb15-2"><a href="#cb15-2" tabindex="-1"></a>cols <span class="ot">=</span> colorspace<span class="sc">::</span><span class="fu">rainbow_hcl</span>(<span class="dv">5</span>)</span>
<span id="cb15-3"><a href="#cb15-3" tabindex="-1"></a><span class="fu">covEllipses</span>(guerry.pca<span class="sc">$</span>x, </span>
<span id="cb15-4"><a href="#cb15-4" tabindex="-1"></a>            <span class="at">group=</span>gdata<span class="sc">$</span>Region, </span>
<span id="cb15-5"><a href="#cb15-5" tabindex="-1"></a>            <span class="at">pooled=</span><span class="cn">FALSE</span>, </span>
<span id="cb15-6"><a href="#cb15-6" tabindex="-1"></a>            <span class="at">fill=</span><span class="cn">TRUE</span>, <span class="at">fill.alpha=</span><span class="fl">0.1</span>,</span>
<span id="cb15-7"><a href="#cb15-7" tabindex="-1"></a>            <span class="at">col=</span>cols, </span>
<span id="cb15-8"><a href="#cb15-8" tabindex="-1"></a>            <span class="at">label.pos=</span><span class="fu">c</span>(<span class="dv">3</span>,<span class="dv">0</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">3</span>), </span>
<span id="cb15-9"><a href="#cb15-9" tabindex="-1"></a>            <span class="at">cex=</span><span class="dv">2</span>,</span>
<span id="cb15-10"><a href="#cb15-10" tabindex="-1"></a>            <span class="at">xlim=</span><span class="fu">c</span>(<span class="sc">-</span><span class="dv">4</span>,<span class="dv">4</span>), <span class="at">ylim=</span><span class="fu">c</span>(<span class="sc">-</span><span class="dv">4</span>,<span class="dv">4</span>),</span>
<span id="cb15-11"><a href="#cb15-11" tabindex="-1"></a>            <span class="at">xlab =</span> <span class="st">&quot;Dimension 1 (35.7 %)&quot;</span>, </span>
<span id="cb15-12"><a href="#cb15-12" tabindex="-1"></a>            <span class="at">ylab =</span> <span class="st">&quot;Dimension 2 (20.0 %)&quot;</span>,</span>
<span id="cb15-13"><a href="#cb15-13" tabindex="-1"></a>            <span class="at">cex.lab=</span><span class="fl">1.4</span></span>
<span id="cb15-14"><a href="#cb15-14" tabindex="-1"></a>            )</span>
<span id="cb15-15"><a href="#cb15-15" tabindex="-1"></a><span class="fu">points</span>(guerry.pca<span class="sc">$</span>x, <span class="at">pch=</span>(<span class="dv">15</span><span class="sc">:</span><span class="dv">19</span>)[Guerry<span class="sc">$</span>Region], <span class="at">col=</span>cols[Guerry<span class="sc">$</span>Region])</span>
<span id="cb15-16"><a href="#cb15-16" tabindex="-1"></a></span>
<span id="cb15-17"><a href="#cb15-17" tabindex="-1"></a>candisc<span class="sc">::</span><span class="fu">vectors</span>(guerry.pca<span class="sc">$</span>rotation, <span class="at">scale=</span><span class="dv">5</span>,  </span>
<span id="cb15-18"><a href="#cb15-18" tabindex="-1"></a>                 <span class="at">col=</span><span class="st">&quot;black&quot;</span>, <span class="at">lwd=</span><span class="dv">3</span>, <span class="at">cex=</span><span class="fl">1.4</span>, </span>
<span id="cb15-19"><a href="#cb15-19" tabindex="-1"></a>                 <span class="at">pos =</span> <span class="fu">c</span>(<span class="dv">4</span>,<span class="dv">2</span>,<span class="dv">4</span>,<span class="dv">2</span>,<span class="dv">2</span>,<span class="dv">2</span>),</span>
<span id="cb15-20"><a href="#cb15-20" tabindex="-1"></a>                 <span class="at">xpd=</span><span class="cn">TRUE</span>)</span>
<span id="cb15-21"><a href="#cb15-21" tabindex="-1"></a><span class="fu">abline</span>(<span class="at">h=</span><span class="dv">0</span>, <span class="at">v=</span><span class="dv">0</span>, <span class="at">col=</span><span class="fu">gray</span>(.<span class="dv">70</span>))</span></code></pre></div>
<p><img src="" width="95%" /></p>
<p>An interpretation can be read from both the directions of the
variable arrows and the relative positions of the ellipses representing
the scatter of the component scores for the different regions.</p>
<ul>
<li>The first component is largely aligned positively with
<code>Literacy</code> and negatively with property crime, suicides and
children born out of wedlock (<code>Infants</code>)</li>
<li>The second dimension reflects mainly the correlation of personal
crime and donations to the poor.</li>
<li>The South region is generally lower on PC2, the West, generally
higher.</li>
<li>The North stands out as being higher than the others on PC1; the
West somewhat higher on PC2.</li>
</ul>
</div>
</div>
<div id="models" class="section level1">
<h1>Models</h1>
<p>Here we illustrate:</p>
<ul>
<li>Model based plots for linear regression models predicting personal
crime and property crime</li>
<li>Multivariate analysis of variance (MANOVA) and HE plots for the
joint relation of the crime variables to other predictors.</li>
</ul>
<div id="predicting-crime-univariate-regression" class="section level2">
<h2>Predicting crime: Univariate regression</h2>
<p>The simplest approach to predicting the crime variables would be to
fit a separate multiple regression to each.</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" tabindex="-1"></a>crime.mod1 <span class="ot">&lt;-</span> <span class="fu">lm</span>(Crime_pers <span class="sc">~</span>  Region <span class="sc">+</span> Literacy <span class="sc">+</span> Donations <span class="sc">+</span>  Infants <span class="sc">+</span> Suicides, <span class="at">data=</span>Guerry)</span>
<span id="cb16-2"><a href="#cb16-2" tabindex="-1"></a>crime.mod2 <span class="ot">&lt;-</span> <span class="fu">lm</span>(Crime_prop <span class="sc">~</span>  Region <span class="sc">+</span> Literacy <span class="sc">+</span> Donations <span class="sc">+</span>  Infants <span class="sc">+</span> Suicides, <span class="at">data=</span>Guerry)</span></code></pre></div>
<p>Tests for the predictors are best obtained using
<code>car::Anova()</code> which gives <strong>partial</strong> (Type II)
tests, adjusting for other predictors, rather than the
<strong>sequential</strong> (Type I) tests provided by
<code>stats::anova()</code></p>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" tabindex="-1"></a><span class="fu">Anova</span>(crime.mod1)</span>
<span id="cb17-2"><a href="#cb17-2" tabindex="-1"></a><span class="co">#&gt; Anova Table (Type II tests)</span></span>
<span id="cb17-3"><a href="#cb17-3" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb17-4"><a href="#cb17-4" tabindex="-1"></a><span class="co">#&gt; Response: Crime_pers</span></span>
<span id="cb17-5"><a href="#cb17-5" tabindex="-1"></a><span class="co">#&gt;               Sum Sq Df F value    Pr(&gt;F)    </span></span>
<span id="cb17-6"><a href="#cb17-6" tabindex="-1"></a><span class="co">#&gt; Region    1388267847  4  9.0398 5.005e-06 ***</span></span>
<span id="cb17-7"><a href="#cb17-7" tabindex="-1"></a><span class="co">#&gt; Literacy    77140249  1  2.0092    0.1604    </span></span>
<span id="cb17-8"><a href="#cb17-8" tabindex="-1"></a><span class="co">#&gt; Donations   54505520  1  1.4197    0.2372    </span></span>
<span id="cb17-9"><a href="#cb17-9" tabindex="-1"></a><span class="co">#&gt; Infants       102152  1  0.0027    0.9590    </span></span>
<span id="cb17-10"><a href="#cb17-10" tabindex="-1"></a><span class="co">#&gt; Suicides      205432  1  0.0054    0.9419    </span></span>
<span id="cb17-11"><a href="#cb17-11" tabindex="-1"></a><span class="co">#&gt; Residuals 2917886368 76                      </span></span>
<span id="cb17-12"><a href="#cb17-12" tabindex="-1"></a><span class="co">#&gt; ---</span></span>
<span id="cb17-13"><a href="#cb17-13" tabindex="-1"></a><span class="co">#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span>
<span id="cb17-14"><a href="#cb17-14" tabindex="-1"></a><span class="fu">Anova</span>(crime.mod2)</span>
<span id="cb17-15"><a href="#cb17-15" tabindex="-1"></a><span class="co">#&gt; Anova Table (Type II tests)</span></span>
<span id="cb17-16"><a href="#cb17-16" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb17-17"><a href="#cb17-17" tabindex="-1"></a><span class="co">#&gt; Response: Crime_prop</span></span>
<span id="cb17-18"><a href="#cb17-18" tabindex="-1"></a><span class="co">#&gt;              Sum Sq Df F value    Pr(&gt;F)    </span></span>
<span id="cb17-19"><a href="#cb17-19" tabindex="-1"></a><span class="co">#&gt; Region     52269436  4  2.0939 0.0898250 .  </span></span>
<span id="cb17-20"><a href="#cb17-20" tabindex="-1"></a><span class="co">#&gt; Literacy   13366819  1  2.1419 0.1474514    </span></span>
<span id="cb17-21"><a href="#cb17-21" tabindex="-1"></a><span class="co">#&gt; Donations   9218353  1  1.4771 0.2279870    </span></span>
<span id="cb17-22"><a href="#cb17-22" tabindex="-1"></a><span class="co">#&gt; Infants     7577617  1  1.2142 0.2739759    </span></span>
<span id="cb17-23"><a href="#cb17-23" tabindex="-1"></a><span class="co">#&gt; Suicides  100890796  1 16.1665 0.0001355 ***</span></span>
<span id="cb17-24"><a href="#cb17-24" tabindex="-1"></a><span class="co">#&gt; Residuals 474296314 76                      </span></span>
<span id="cb17-25"><a href="#cb17-25" tabindex="-1"></a><span class="co">#&gt; ---</span></span>
<span id="cb17-26"><a href="#cb17-26" tabindex="-1"></a><span class="co">#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span></code></pre></div>
<p>These are somewhat disappointing if you look only at the significance
stars: Only <code>Region</code> is significant for personal crime and
only <code>Suicides</code> for property crime. There is no evidence for
the argument, supported by the liberal hygenicists of Guerry’s time,
that increased <code>Literacy</code> would reduce crime.</p>
<p>For such models, we can understand the nature of the predicted
effects using the <code>effects</code> package. The (marginal) effect
for a given term gives the predicted values, averaging over all other
terms in the model.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" tabindex="-1"></a><span class="fu">plot</span>(<span class="fu">predictorEffects</span>(crime.mod1, <span class="sc">~</span> Region <span class="sc">+</span> Literacy <span class="sc">+</span> Infants <span class="sc">+</span> Suicides), </span>
<span id="cb18-2"><a href="#cb18-2" tabindex="-1"></a>     <span class="at">lwd=</span><span class="dv">2</span>, <span class="at">main=</span><span class="st">&quot;&quot;</span>)</span></code></pre></div>
<p><img src="" width="850" /></p>
<p>Doing the same for property crime, we get:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" tabindex="-1"></a><span class="fu">plot</span>(<span class="fu">predictorEffects</span>(crime.mod2, <span class="sc">~</span> Region <span class="sc">+</span> Literacy <span class="sc">+</span> Infants <span class="sc">+</span> Suicides), </span>
<span id="cb19-2"><a href="#cb19-2" tabindex="-1"></a>     <span class="at">lwd=</span><span class="dv">2</span>, <span class="at">main=</span><span class="st">&quot;&quot;</span>)</span></code></pre></div>
<p><img src="" width="850" /></p>
</div>
<div id="predicting-crime-multivariate-regression" class="section level2">
<h2>Predicting crime: Multivariate regression</h2>
<p>The two regression models can be fit together in a multivariate
regression for both crime variables jointly.</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" tabindex="-1"></a>crime.mod <span class="ot">&lt;-</span> <span class="fu">lm</span>(<span class="fu">cbind</span>(Crime_pers, Crime_prop) <span class="sc">~</span> </span>
<span id="cb20-2"><a href="#cb20-2" tabindex="-1"></a>                Region <span class="sc">+</span> Literacy <span class="sc">+</span> Donations <span class="sc">+</span>  Infants <span class="sc">+</span> Suicides, <span class="at">data=</span>Guerry)</span>
<span id="cb20-3"><a href="#cb20-3" tabindex="-1"></a><span class="fu">Anova</span>(crime.mod)</span>
<span id="cb20-4"><a href="#cb20-4" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb20-5"><a href="#cb20-5" tabindex="-1"></a><span class="co">#&gt; Type II MANOVA Tests: Pillai test statistic</span></span>
<span id="cb20-6"><a href="#cb20-6" tabindex="-1"></a><span class="co">#&gt;           Df test stat approx F num Df den Df    Pr(&gt;F)    </span></span>
<span id="cb20-7"><a href="#cb20-7" tabindex="-1"></a><span class="co">#&gt; Region     4   0.42933   5.1936      8    152 9.563e-06 ***</span></span>
<span id="cb20-8"><a href="#cb20-8" tabindex="-1"></a><span class="co">#&gt; Literacy   1   0.03707   1.4434      2     75 0.2425951    </span></span>
<span id="cb20-9"><a href="#cb20-9" tabindex="-1"></a><span class="co">#&gt; Donations  1   0.02615   1.0071      2     75 0.3701736    </span></span>
<span id="cb20-10"><a href="#cb20-10" tabindex="-1"></a><span class="co">#&gt; Infants    1   0.01833   0.7001      2     75 0.4997450    </span></span>
<span id="cb20-11"><a href="#cb20-11" tabindex="-1"></a><span class="co">#&gt; Suicides   1   0.20772   9.8315      2     75 0.0001615 ***</span></span>
<span id="cb20-12"><a href="#cb20-12" tabindex="-1"></a><span class="co">#&gt; ---</span></span>
<span id="cb20-13"><a href="#cb20-13" tabindex="-1"></a><span class="co">#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span></code></pre></div>
<p>As a quick check on the assumption that the residuals are bivariate
normally distributed and a check for outliers, a <span class="math inline">\(\chi^2\)</span> Q-Q plot graphs the squared
Mahalanobis distances of the residuals against the corresponding <span class="math inline">\(\chi^2_2\)</span> quantiles these would have in a
bivariate normal distribution. The data for Creuse stands out as a
potential outlier.</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" tabindex="-1"></a>labels <span class="ot">&lt;-</span> <span class="fu">paste0</span>(Guerry<span class="sc">$</span>dept,<span class="st">&quot;:&quot;</span>, Guerry<span class="sc">$</span>Department)</span>
<span id="cb21-2"><a href="#cb21-2" tabindex="-1"></a><span class="fu">cqplot</span>(crime.mod, <span class="at">id.n=</span><span class="dv">4</span>, <span class="at">labels=</span>labels)</span></code></pre></div>
<p><img src="" /><!-- --></p>
<div id="he-plots" class="section level3">
<h3>HE plots</h3>
<p>Hypothesis-Error (HE) plots <span class="citation">(<a href="#ref-Friendly:2007:heplots">Friendly 2007</a>; <a href="#ref-FoxFriendlyMonette:09:compstat">J. Fox, Friendly, and Monette
2009</a>; <a href="#ref-heplots">Friendly, Fox, and Georges Monette
2022</a>)</span> provide a convenient graphical summary of hypothesis
tests in multivariate linear model. They plot a data ellipse for the
residuals in the model, representing the <span class="math inline">\(\mathbf{E}\)</span> matrix in the test statistics
(Roy’s maximum root test, Pillai and Hotelling trace criteria and Wilks’
Lambda). Overlaid on this are <span class="math inline">\(\mathbf{H}\)</span> ellipses for each term in the
model, representing the data ellipses for the fitted values. Using Roy’s
test, these have a convenient interpretation: a term is significant
<em>iff</em> the H ellipse projects anywhere outside the E ellipse. For
a 1 df (regression) variable, the H ellipse collapses to a line.</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" tabindex="-1"></a><span class="fu">heplot</span>(crime.mod, </span>
<span id="cb22-2"><a href="#cb22-2" tabindex="-1"></a>       <span class="at">fill=</span><span class="cn">TRUE</span>, <span class="at">fill.alpha=</span><span class="fl">0.05</span>, </span>
<span id="cb22-3"><a href="#cb22-3" tabindex="-1"></a>       <span class="at">cex=</span><span class="fl">1.4</span>, <span class="at">cex.lab=</span><span class="fl">1.3</span> )</span></code></pre></div>
<p><img src="" /><!-- --></p>
<p>In this plot, the effect of <code>Suicides</code> is completely
aligned with crimes against property. The effect of <code>Region</code>
is positively correlated with both types of crime. The means for the
regions show that the South of France is lower (worse) on personal
crime; the other regions vary most in property crime, with the North
being lower and the Center being higher.</p>
</div>
<div id="canonical-plots" class="section level3">
<h3>Canonical plots</h3>
<p>The HE plot displays these relations in <strong>data space</strong>.
An alternative is provided by canonical discriminant analysis, which
finds the weighted sums of the response variables leading to the largest
test statistics for the terms, which can be visualized in
<strong>canonical space</strong>.</p>
<p>The analysis below reflects the effect of <code>Region</code> in
relation to both crime variables.</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" tabindex="-1"></a>crime.can <span class="ot">&lt;-</span> <span class="fu">candisc</span>(crime.mod)</span>
<span id="cb23-2"><a href="#cb23-2" tabindex="-1"></a>crime.can</span>
<span id="cb23-3"><a href="#cb23-3" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb23-4"><a href="#cb23-4" tabindex="-1"></a><span class="co">#&gt; Canonical Discriminant Analysis for Region:</span></span>
<span id="cb23-5"><a href="#cb23-5" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb23-6"><a href="#cb23-6" tabindex="-1"></a><span class="co">#&gt;     CanRsq Eigenvalue Difference Percent Cumulative</span></span>
<span id="cb23-7"><a href="#cb23-7" tabindex="-1"></a><span class="co">#&gt; 1 0.337068    0.50845    0.40681   83.34      83.34</span></span>
<span id="cb23-8"><a href="#cb23-8" tabindex="-1"></a><span class="co">#&gt; 2 0.092267    0.10164    0.40681   16.66     100.00</span></span>
<span id="cb23-9"><a href="#cb23-9" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb23-10"><a href="#cb23-10" tabindex="-1"></a><span class="co">#&gt; Test of H0: The canonical correlations in the </span></span>
<span id="cb23-11"><a href="#cb23-11" tabindex="-1"></a><span class="co">#&gt; current row and all that follow are zero</span></span>
<span id="cb23-12"><a href="#cb23-12" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb23-13"><a href="#cb23-13" tabindex="-1"></a><span class="co">#&gt;   LR test stat approx F numDF denDF   Pr(&gt; F)    </span></span>
<span id="cb23-14"><a href="#cb23-14" tabindex="-1"></a><span class="co">#&gt; 1      0.60177   5.7097     8   158 2.209e-06 ***</span></span>
<span id="cb23-15"><a href="#cb23-15" tabindex="-1"></a><span class="co">#&gt; 2      0.90773   2.7105     3    80   0.05051 .  </span></span>
<span id="cb23-16"><a href="#cb23-16" tabindex="-1"></a><span class="co">#&gt; ---</span></span>
<span id="cb23-17"><a href="#cb23-17" tabindex="-1"></a><span class="co">#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</span></span></code></pre></div>
<p>The HE plot for this analysis is shown below. Variable vector
represent the correlations of the crime variables with the canonical
dimension.</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" tabindex="-1"></a><span class="fu">heplot</span>(crime.can, <span class="at">fill=</span><span class="cn">TRUE</span>, <span class="at">fill.alpha=</span><span class="fl">0.1</span>,</span>
<span id="cb24-2"><a href="#cb24-2" tabindex="-1"></a>       <span class="at">var.col =</span> <span class="st">&quot;black&quot;</span>, </span>
<span id="cb24-3"><a href="#cb24-3" tabindex="-1"></a>       <span class="at">var.cex =</span> <span class="fl">1.3</span>,</span>
<span id="cb24-4"><a href="#cb24-4" tabindex="-1"></a>       <span class="at">cex=</span><span class="fl">1.4</span>, <span class="at">cex.lab=</span><span class="fl">1.3</span>)</span></code></pre></div>
<p><img src="" /><!-- --></p>
<pre><code>#&gt; Vector scale factor set to  3.10537</code></pre>
<p>This gives a simple interpretation of the differences in
<code>Region</code> on the crime variables. The first canonical
dimension accounts for 83% of differences among the regions, and this is
nearly perfectly aligned with personal crime, with the largest
difference between the South and the other regions. The second canonical
dimension, accounting for the remaining 17%, is perfectly aligned with
property crime. On this dimension, the North stands out compared to the
other regions.</p>
</div>
</div>
</div>
<div id="references" class="section level1 unnumbered">
<h1 class="unnumbered">References</h1>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-FoxFriendlyMonette:09:compstat" class="csl-entry">
Fox, J., M. Friendly, and G. Monette. 2009. <span>“Visualizing
Hypothesis Tests in Multivariate Linear Models: The <em>Heplots</em>
Package for <span>R</span>.”</span> <em>Computational Statistics</em> 24
(2): 233–46. <a href="https://datavis.ca/papers/FoxFriendlyMonette-2009.pdf">https://datavis.ca/papers/FoxFriendlyMonette-2009.pdf</a>.
</div>
<div id="ref-R-car" class="csl-entry">
Fox, John, Sanford Weisberg, and Brad Price. 2023. <em>Car: Companion to
Applied Regression</em>. <a href="https://r-forge.r-project.org/projects/car/">https://r-forge.r-project.org/projects/car/</a>.
</div>
<div id="ref-Friendly:02:corrgram" class="csl-entry">
Friendly, Michael. 2002. <span>“Corrgrams: Exploratory Displays for
Correlation Matrices.”</span> <em>The American Statistician</em> 56 (4):
316–24. <a href="http://datavis.ca/papers/corrgram.pdf">http://datavis.ca/papers/corrgram.pdf</a>.
</div>
<div id="ref-Friendly:2007:heplots" class="csl-entry">
———. 2007. <span>“HE Plots for Multivariate General Linear
Models.”</span> <em>Journal of Computational and Graphical
Statistics</em> 16 (4): 421–44. <a href="http://datavis.ca/papers/jcgs-heplots.pdf">http://datavis.ca/papers/jcgs-heplots.pdf</a>.
</div>
<div id="ref-heplots" class="csl-entry">
Friendly, Michael, John Fox, and and Georges Monette. 2022. <em><span class="nocase">heplots</span>: Visualizing Tests in Multivariate Linear
Models</em>. <a href="https://CRAN.R-project.org/package=heplots">https://CRAN.R-project.org/package=heplots</a>.
</div>
<div id="ref-Galton:1886" class="csl-entry">
Galton, Francis. 1886. <span>“Regression Towards Mediocrity in
Hereditary Stature.”</span> <em>Journal of the Anthropological
Institute</em> 15: 246–63.
</div>
<div id="ref-Guerry:1833" class="csl-entry">
Guerry, André-Michel. 1833. <em>Essai Sur La Statistique Morale de La
<span>France</span></em>. Paris: Crochard.
</div>
<div id="ref-R-GGally" class="csl-entry">
Schloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz
Marbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2021. <em>GGally:
Extension to Ggplot2</em>. <a href="https://ggobi.github.io/ggally/">https://ggobi.github.io/ggally/</a>.
</div>
<div id="ref-R-ggbiplot" class="csl-entry">
Vu, Vincent, and Michael Friendly. 2023. <em>Ggbiplot: A Ggplot2 Based
Biplot</em>. <a href="https://github.com/friendly/ggbiplot">https://github.com/friendly/ggbiplot</a>.
</div>
<div id="ref-R-corrgram" class="csl-entry">
Wright, Kevin. 2021. <em>Corrgram: Plot a Correlogram</em>. <a href="https://kwstat.github.io/corrgram/">https://kwstat.github.io/corrgram/</a>.
</div>
</div>
</div>



<!-- code folding -->


<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>

</body>
</html>
